Search CORE

254 research outputs found

Logical Segmentation of Source Code

Author: Dormuth Jacob
Gelman Ben
Moore Jessica
Slater David
Publication venue: 'KSI Research Inc.'
Publication date: 18/07/2019
Field of study

Many software analysis methods have come to rely on machine learning approaches. Code segmentation - the process of decomposing source code into meaningful blocks - can augment these methods by featurizing code, reducing noise, and limiting the problem space. Traditionally, code segmentation has been done using syntactic cues; current approaches do not intentionally capture logical content. We develop a novel deep learning approach to generate logical code segments regardless of the language or syntactic correctness of the code. Due to the lack of logically segmented source code, we introduce a unique data set construction technique to approximate ground truth for logically segmented code. Logical code segmentation can improve tasks such as automatically commenting code, detecting software vulnerabilities, repairing bugs, labeling code functionality, and synthesizing new code.Comment: SEKE2019 Conference Full Pape

arXiv.org e-Print Archive

Crossref

That Escalated Quickly: An ML Framework for Alert Prioritization

Author: Berlin Konstantin
Gelman Ben
Taoufiq Salma
Vörös Tamás
Publication venue
Publication date: 15/02/2023
Field of study

In place of in-house solutions, organizations are increasingly moving towards managed services for cyber defense. Security Operations Centers are specialized cybersecurity units responsible for the defense of an organization, but the large-scale centralization of threat detection is causing SOCs to endure an overwhelming amount of false positive alerts -- a phenomenon known as alert fatigue. Large collections of imprecise sensors, an inability to adapt to known false positives, evolution of the threat landscape, and inefficient use of analyst time all contribute to the alert fatigue problem. To combat these issues, we present That Escalated Quickly (TEQ), a machine learning framework that reduces alert fatigue with minimal changes to SOC workflows by predicting alert-level and incident-level actionability. On real-world data, the system is able to reduce the time it takes to respond to actionable incidents by

22.9\%

, suppress

54\%

of false positives with a

95.1\%

detection rate, and reduce the number of alerts an analyst needs to investigate within singular incidents by

14\%

.Comment: Submitted to Usenix Security Symposiu

arXiv.org e-Print Archive

A Language-Agnostic Model for Semantic Source Code Labeling

Author: Gelman Ben
Hoyle Bryan
Moore Jessica
Saxe Joshua
Slater David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/06/2019
Field of study

Code search and comprehension have become more difficult in recent years due to the rapid expansion of available source code. Current tools lack a way to label arbitrary code at scale while maintaining up-to-date representations of new programming languages, libraries, and functionalities. Comprehensive labeling of source code enables users to search for documents of interest and obtain a high-level understanding of their contents. We use Stack Overflow code snippets and their tags to train a language-agnostic, deep convolutional neural network to automatically predict semantic labels for source code documents. On Stack Overflow code snippets, we demonstrate a mean area under ROC of 0.957 over a long-tailed list of 4,508 tags. We also manually validate the model outputs on a diverse set of unlabeled source code documents retrieved from Github, and we obtain a top-1 accuracy of 86.6%. This strongly indicates that the model successfully transfers its knowledge from Stack Overflow snippets to arbitrary source code documents.Comment: MASES 2018 Publicatio

arXiv.org e-Print Archive

Crossref

Stan: A Probabilistic Programming Language

Author: Betancourt Michael
Brubaker Marcus
Carpenter Bob
Gelman Andrew
Goodrich Ben
Guo Jiqiang
Hoffman Matthew D.
Lee Daniel
Li Peter
Riddell Allen
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 08/01/2016
Field of study

Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting

CiteSeerX

Directory of Open Access Journals

PubMed Central

Journal of Statistical Software

Recommended from our members

How to Change the Weight of Rare Events in Decisions from Experience

Author: Andreas Jarvstad
Ben R. Newell
Chris Donkin
Gelman A.
Hadar L.
Hertwig R.
Hertwig R.
Jared M. Hotaling
Sutton R. S.
von Neumann J.
Publication venue: 'SAGE Publications'
Publication date: 01/12/2019
Field of study

When making risky choices, two kinds of information are crucial: outcome values and outcome probabilities. Here, we demonstrate that the juncture at which value and probability information is provided has a fundamental effect on choice. Across four experiments involving 489 participants, we compare two decision making scenarios: one where value information is revealed during sampling (Standard), and one where value information is revealed after sampling (Value-Ignorance). On average, participants made riskier choices when value information was provided after sampling. Moreover, parameter estimates from a hierarchical Bayesian implementation of cumulative prospect theory suggested that participants overweighted rare events when value information was absent during sampling, but showed no overweighting in the Standard condition. This suggests that the impact of rare events on choice relies crucially on the timing of probability and value integration. We provide paths towards mechanistic explanations of our results based on frameworks which assume different underlying cognitive architectures

City Research Online

Crossref

Possible Disintegrating Short-Period Super-Mercury Orbiting KIC 12557548

Author: A. Levine
B. Kalomeni
Batalha
Batalha
Batygin
Ben-Jaffel
Borucki
Castan
Cooper
Cranmer
Dawson
E. Chiang
E. S. Kite
Fernandez
Fischer
Gelman
Gu
Hodgkin
I. El Mellah
J. Jenkins
Jenkins
K. Tran
L. Nelson
L. Rousseau-Nepton
Lamers
Lemaire
Linsky
M. Kotson
Madhusudhan
Marsch
Muirhead
Murray-Clay
Podsiadlowski
Prsa
S. Rappaport
Schaefer
Schaefer
Torres
Valencia
van Summeren
Volkov
Winn
Wood
Publication venue: 'IOP Publishing'
Publication date: 01/01/2012
Field of study

We report here on the discovery of stellar occultations, observed with Kepler, that recur periodically at 15.685 hour intervals, but which vary in depth from a maximum of 1.3% to a minimum that can be less than 0.2%. The star that is apparently being occulted is KIC 12557548, a K dwarf with T_eff = 4400 K and V = 16. Because the eclipse depths are highly variable, they cannot be due solely to transits of a single planet with a fixed size. We discuss but dismiss a scenario involving a binary giant planet whose mutual orbit plane precesses, bringing one of the planets into and out of a grazing transit. We also briefly consider an eclipsing binary, that either orbits KIC 12557548 in a hierarchical triple configuration or is nearby on the sky, but we find such a scenario inadequate to reproduce the observations. We come down in favor of an explanation that involves macroscopic particles escaping the atmosphere of a slowly disintegrating planet not much larger than Mercury. The particles could take the form of micron-sized pyroxene or aluminum oxide dust grains. The planetary surface is hot enough to sublimate and create a high-Z atmosphere; this atmosphere may be loaded with dust via cloud condensation or explosive volcanism. Atmospheric gas escapes the planet via a Parker-type thermal wind, dragging dust grains with it. We infer a mass loss rate from the observations of order 1 M_E/Gyr, with a dust-to-gas ratio possibly of order unity. For our fiducial 0.1 M_E planet, the evaporation timescale may be ~0.2 Gyr. Smaller mass planets are disfavored because they evaporate still more quickly, as are larger mass planets because they have surface gravities too strong to sustain outflows with the requisite mass-loss rates. The occultation profile evinces an ingress-egress asymmetry that could reflect a comet-like dust tail trailing the planet; we present simulations of such a tail.Comment: 14 pages, 7 figures; submitted to ApJ, January 10, 2012; accepted March 21, 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Caltech Authors

Ege University Institutional Repository

DSpace@IZTECH

Modeling of spatio-temporal variation in plague incidence in Madagascar from 1980 to 2007

Author: Amante
Andrianaivoarimanana
Ben Ari
Brygoo
Cox
Cyril Caminade
Diggle
Doxsey-Whitfield
Emanuele Giorgi
Gelman
Held
Kalnay
Katharina Kreppel
Kelsall
Kreppel
Kreppel
Maherisoa Ratsitorahina
Matthew Baylis
Minoarisoa Rajerison
Neal
Peter J. Diggle
Rue
Stenseth
Wakefield
WHO
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Systematizing Confidence in Open Research and Evidence (SCORE)

Author: Alipourfard Nazanin
Almenberg Anna Dreber
Arendt Beatrix
Benjamin Daniel M.
Benkler Noam
Bishop Michael
Burstein Mark
Bush Martin
Caverlee James
Chen Yiling
Clark Chae
Errington Timothy M.
et al.
Fidler Fiona
Fox Nicholas
Frank Aaron
Fraser Hannah
Friedman Scott
Gelman Ben
Gentile James
SCORE Collaboration
Wu Jian
Publication venue: ODU Digital Commons
Publication date: 01/01/2021
Field of study

Assessing the credibility of research claims is a central, continuous, and laborious part of the scientific process. Credibility assessment strategies range from expert judgment to aggregating existing evidence to systematic replication efforts. Such assessments can require substantial time and effort. Research progress could be accelerated if there were rapid, scalable, accurate credibility indicators to guide attention and resource allocation for further assessment. The SCORE program is creating and validating algorithms to provide confidence scores for research claims at scale. To investigate the viability of scalable tools, teams are creating: a database of claims from papers in the social and behavioral sciences; expert and machine generated estimates of credibility; and, evidence of reproducibility, robustness, and replicability to validate the estimates. Beyond the primary research objective, the data and artifacts generated from this program will be openly shared and provide an unprecedented opportunity to examine research credibility and evidence

Old Dominion University

A New Approach for Assessment of Mental Architecture: Repeated Tagging

Author: A Rose
Agne Põlder
Aire Raidvee
EN Dzhafarov
EN Dzhafarov
EN Dzhafarov
J Allik
J Emmerton
J Halberda
JT Townsend
JT Townsend
JT Townsend
JT Townsend
JT Townsend
Jüri Allik
L Feigenson
LM Trick
M Tokita
R Gelman
S Dehaene
S Sternberg
Suliann Ben Hamed
WK Honig
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A new approach to the study of a relatively neglected property of mental architecture—whether and when the already-processed elements are separated from the to-be-processed elements—is proposed. The process of numerical proportion discrimination between two sets of elements defined either by color or by orientation can be described as sampling with or without replacement (characterized by binomial or hypergeometric probability distributions respectively) depending on the possibility to tag an element once or repeatedly. All empirical psychometric functions were approximated by a theoretical model showing that the ability to keep track of the already tagged elements is not an inflexible part of the mental architecture but rather an individually variable strategy which also depends on conspicuity of perceptual attributes. Strong evidence is provided that in a considerable number of trials, observers tagged the same element repeatedly which can only be done serially at two separate time moments

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central